AITopics | training speed and model selection

ABayesian Perspectiveon Training Speed and Model Selection

Neural Information Processing SystemsFeb-8-2026, 23:43:52 GMT

artificial intelligence, machine learning, neural information processing system, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.04)
North America > United States > Arizona > Maricopa County > Scottsdale (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.41)

Add feedback

A Bayesian Perspective on Training Speed and Model Selection

Neural Information Processing SystemsDec-24-2025, 04:53:44 GMT

We take a Bayesian perspective to illustrate a connection between training speed and the marginal likelihood in linear models. This provides two major insights: first, that a measure of a model's training speed can be used to estimate its marginal likelihood. Second, that this measure, under certain conditions, predicts the relative weighting of models in linear model combinations trained to minimize a regression loss. We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks. We further provide encouraging empirical evidence that the intuition developed in these settings also holds for deep neural networks trained with stochastic gradient descent. Our results suggest a promising new direction towards explaining why neural networks trained with stochastic gradient descent are biased towards functions that generalize well.

bayesian perspective, name change, training speed and model selection, (7 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.85)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Review for NeurIPS paper: A Bayesian Perspective on Training Speed and Model Selection

Neural Information Processing SystemsJan-25-2025, 21:29:07 GMT

Weaknesses: At Eq. 5, the authors introduce two sampling based estimators of the lower bound (LB). I am not sure why the authors introduced both as an estimator for the LB: The second estimator is an unbiased estimator of the (log) marginal likelihood (ML). Though it could technically be considered a biased estimator of LB, I do not see why it should be introduced as such, since it is the unbiased estimator of the exact value the authors are hoping to approximate. Actually, in the following sentence the authors write that the second estimator's bias decreases as J is increased, which is very much expected, if not almost trivial considering the point above. Another point is that when J 1 the two estimators are algebraically the same, therefore the first one also becomes (a noisy) unbiased estimator of ML.

bayesian perspective, estimator, training speed and model selection, (8 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.80)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.45)

Add feedback

Review for NeurIPS paper: A Bayesian Perspective on Training Speed and Model Selection

Neural Information Processing SystemsJan-25-2025, 21:28:59 GMT

The work considers SGD training for Bayesian linear models and illustrate a connection between training speed and generalization and why SGD tends to select simpler models. In particular, the work illustrates that a particular type of posterior sampling from gradient descent yields same model rankings as that based on the true posterior under suitable assumptions. Experiments on deep nets are also presented. The reviewers liked the work overall, but felt that some aspects of the exposition were unclear, the transition and implications for deep nets is not quite convincing especially since there is now better understanding of both optimization and generalization in deep nets, and baseline comparisons (e.g., sgld, L2 regularization, dropout, etc.) would strengthen the work.

bayesian perspective, neurips paper, training speed and model selection, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.78)

Add feedback

A Bayesian Perspective on Training Speed and Model Selection

Neural Information Processing SystemsOct-10-2024, 13:46:04 GMT

We take a Bayesian perspective to illustrate a connection between training speed and the marginal likelihood in linear models. This provides two major insights: first, that a measure of a model's training speed can be used to estimate its marginal likelihood. Second, that this measure, under certain conditions, predicts the relative weighting of models in linear model combinations trained to minimize a regression loss. We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks. We further provide encouraging empirical evidence that the intuition developed in these settings also holds for deep neural networks trained with stochastic gradient descent.

bayesian perspective, neural network, training speed and model selection, (3 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.52)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.79)

Add feedback

Generalization Through the Lens of Learning Dynamics

Lyle, Clare

arXiv.org Artificial IntelligenceDec-10-2022

A machine learning (ML) system must learn not only to match the output of a target function on a training set, but also to generalize to novel situations in order to yield accurate predictions at deployment. In most practical applications, the user cannot exhaustively enumerate every possible input to the model; strong generalization performance is therefore crucial to the development of ML systems which are performant and reliable enough to be deployed in the real world. While generalization is well-understood theoretically in a number of hypothesis classes, the impressive generalization performance of deep neural networks has stymied theoreticians. In deep reinforcement learning (RL), our understanding of generalization is further complicated by the conflict between generalization and stability in widely-used RL algorithms. This thesis will provide insight into generalization by studying the learning dynamics of deep neural networks in both supervised and reinforcement learning tasks.

machine learning, neural information processing system 32, reinforcement learning, (21 more...)

arXiv.org Artificial Intelligence

2212.05377

Country: